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Abstract 

We propose a Bayesian inference approach for a class of latent Markov models. 
These models are widely used for the analysis of longitudinal categorical data, 
when the interest is in studying the evolution of an individual unobservable 
characteristic. We consider, in particular, the basic latent Markov, which does 
not account for individual covariates, and its version that includes such covari- 
ates in the measurement model. The proposed inferential approach is based on 
a system of priors formulated on a transformation of the initial and transition 
probabilities of the latent Markov chain. This system of priors is equivalent to 
one based on Dirichlet distributions. In order to draw samples from the joint 
posterior distribution of the parameters and the number of latent states, we 
implement a reversible jump algorithm which alternates moves of Metropolis- 
Hastings type with moves of split /combine and birth/death types. The pro- 
posed approach is illustrated through two applications based on longitudinal 
datasets. 
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1 Introduction 



The class of latent Markov (LM) models was introduced by Wiggins (1955, 1973) for 
the analysis of categorical longitudinal data. These models are specially tailored to 
study the evolution of an individual characteristic which is not directly observable. 
The basic LM formulation is similar to that of hidden Markov (HM) models for time 
series data (MacDonald and Zucchini, 1997); in fact, a latent Markov chain, typically 
of first order, is used to represent the evolution of the latent characteristic over time. 
Moreover, the response variables observed at the different occasions are assumed to be 
conditionally independent given this chain (assumption of local independence). The 
basic idea behind this assumption is that the latent process fully explains the observ- 
able behavior of a subject. Furthermore, the latent state to which a subject belongs 
at a certain occasion only depends on the latent state at the previous occasion. An 
LM model may also be seen as an extension of the latent class (LC) model (Lazarsfeld 
and Neil, 1968; Goodman, 1974), in which the assumption that each subject belongs 
to the same latent class throughout the survey is suitable relaxed. 

Typical applications of LM models are in studies of the human behaviors and 
conditions in health, education, sociology, and criminology. These models have also 
been adopted in economics to study the job market or customer's choice problems. 
In addition to Wiggins, among the first authors dealing with LM models, it is also 
worth mentioning Van de Pol and De Leeuw (1986), Van de Pol and Langeheine 
(1990), Collins and Wugalter (1992), and Langeheine and Van de Pol (1994). For a 
complete review of the state of art of the LM models, see Bartolucci et al. (2010). 

The basic LM model, relying on a homogenous Markov chain, has several exten- 
sions based on parameterizations that allow us to include hypotheses and constraints 
of interest. Generally speaking, these parameterizations may concern the conditional 
distribution of the response variables given the latent process {measurement model), 



and/or the distribution of the latent process {latent model). An example of an LM 
model based on constraints on the measurement model is given by the LM Rasch 
model (Bartolucci et al., 2008), which is a generalization of the model introduced by 
Rasch (1961) allowing each subject to evolve in his/her ability level. About the latent 
model, the most interesting constraints may be expressed on the transition matrix. 
In particular, transitions between two given states may be excluded, and/or certain 
elements of the transition matrix may be constrained to be equal (see, among others, 
Bartolucci, 2006; Vermunt et al., 1999). These parametrizations may also be exploited 
to include individual covariates in the measurement or in the latent model. The case 
of covariates included in the measurement model was dealt with by Bartolucci and 
Farcomeni (2009) among others, whereas the case of covariates included in the latent 
model, so that they affect the initial and the transition probabilities of the Markov 
chain, was dealt with by Vermunt et al. (1999) and Bartolucci et al. (2007b). In 
the present paper, we focus on LM models with constraints and individual covariates 
included in the measurement model only. 

In the frequentist approach, estimation of the parameters of an LM model is 
typically based on the maximum likelihood approach through the Expectation - 
Maximization (EM) algorithm (Baum et al., 1970; Dempster et al., 1977), whose 
implementation makes use of suitable recursions. As typically happens for latent 
variable models, the likelihood function of an LM model may be multimodal, and 
the search of the global maximum may be cumbersome, also due to the slowness to 
converge of the EM algorithm. Moreover, this kind of literature has not still provided 
a commonly accepted criterion for formal assessment of the number of the states of 
the latent chain, although information criteria are typically used. We refer, in partic- 
ular, to the Akaike Information Criterion (AIC), see Akaike (1973), and the Bayesian 
Information Criterion (BIC), see Schwarz (1978). On the other hand, in the Bayesian 
inference approach, parameter estimation does not suffer from the problem of mul- 



timodality of the likelihood and model selection is well principled. Obviously, the 
condition to apply this type of inference is that one is able to draw samples from the 
joint posterior distribution of the model parameters and the number of latent states. 

In this paper, we propose a Bayesian inference approach for the basic LM model 
and its extended versions based on suitable parametrizations of the conditional re- 
sponse probabilities given the latent states. These parametrizations may be used to 
formulate hypotheses of interest or include individual covariates. The approach is 
based on a system of priors that we propose following the approach for HM models of 
Cappe et al. (2005) and Spezia (2010). Instead of formulating the prior distributions 
directly on the initial and transition probabilities of the Markov chain, we formulate 
these distributions on unnormalized versions of these probabilities. In particular, we 
assume that each unnormalized initial and transition probability a priori has an in- 
dependent Gamma distribution with suitable hyperparameters. This system of priors 
considerably facilitates Bayesian model estimation from the practical point of view, 
while being equivalent to a system of priors based on Dirichlet distributions on the 
normalized probabilities. 

Under the above system of priors, we estimate the model parameters and select 
the number of latent states through a reversible jump (RJ) algorithm (Green, 1995). 
As is well known, this is a Markov chain Monte Carlo (MCMC) algorithm which rep- 
resents an extension of the Metropolis-Hastings algorithm (Metropolis et al., 1953; 
Hastings, 1970) that allows us to simulate samples from the posterior distribution 
when the parameter space has varying dimension. Our implementation of the RJ 
algorithm follows that proposed by Richardson and Green (1997) for Bayesian esti- 
mation of finite mixture models and that of the RJ algorithm of Robert et al. (2000) 
for estimation of HM models. In particular, this implementation is based on a series 
of transdimensional moves (i.e., split/combine and birth/death moves), which allow 
us to change the number of latent states. These moves are alternated with moves of 



MH type to draw samples from the posterior distribution of the model parameters, 
when the same number of latent states is held fixed. 

The paper is organized as follows. Section 2 illustrates the basic LM model for 
univariate and multivariate categorical longitudinal data, whereas its extensions are 
discussed in Section 3. The proposed system of priors is illustrated in Section 4. In 
Section 5 we describe the RJ algorithm to draw samples for the posterior distribution 
of the model parameters and the number of latent states. Finally, two applications are 
illustrated in Section 6. These applications are based on a dataset about marijuana 
consumption and a dataset about female labour participation. Finally, in Section 7 
we draw main conclusions about the proposed approach. 

2 Basic latent Markov model 

We introduce the preliminary concepts about the basic LM model for categorical 
longitudinal data, in which the conditional distribution of each response variable 
given the corresponding latent variable and the initial and transition probabilities of 
the latent process are unconstrained. 

2.1 Formulation of univariate responses 

In the univariate case, let Yi = (y/ , . . . , ), i = 1, . . . ,n, denote a sequence of 
T categorical response variables with / levels or categories, coded from to / — 1, 
independently observed over n subjects, that correspond to repeated measurements 
on the same subject at different occasions. 

The main assumption underlying the basic LM model is that of local independence, 
i.e. for every subject the response variables are conditionally independent given a 

(1) (T) 

latent process Ui = {Ur\ ■■■M )■ This latent process is assumed to follow a first- 
order Markov chain with state space {1, . . . , /c}. Then, for all t > 2, the latent variable 



U-*^ is conditionally independent of U^^\ . . . , U-*~'^^ given U-*~^\ 

Parameters of the model are the conditional response probabilities (py^^ = piY^*^ = 
y\U-^^ = u), t = 1, . . . ,T, u = 1, . . . ,k, y = 0, — 1, the initial probabilities 
TTn = p{Ui^^ = u),u = l,...,k, and the transition probabilities vr^iu = p{u'f' = 
v\ul'^ = m), t = 2, . . . ,T, u,v = l,...,k. Note that the latent process is time 
homogeneous, so that the transition probabilities do not depend on t, moreover the 
initial probabilities are completely unconstrained. Furthermore, all these probabilities 
do not depend on i since, in its basic version, the model does not account for individual 
covariates. 

The assumptions above imply that the distribution of Ui may be expressed as 

p{Ui = u) = 7r„(i, Yl vr„(0|«(t-i), 
t>i 

where u = {u^^\ . . . ,u^'^^). Moreover, the conditional distribution of Yi given Ui 
may be expressed as 

p{Yi = y\Ui = u)= l[(pyl)\^w, 

t 

and, consequently, for the manifest distribution of Yi we have 
f{y) = piY, = y) = J2piY, = y\U, = u)p{U, = u) = 

u 

where y = (y^^\ . . . , y^^'^) . In order to efficiently compute f{y) we can use a for- 
ward recursion (Baum et al., 1970), for obtaining q'^^\u^y) = p{uf' = u,y}^'^ = 
y^^\ . . . , f/*^ = y*^*^) for t = 1, . . . , T. The recursion is as follows: given q'''^~^\u, y), 
u = 1, . . . ,k, the t-th iteration consists of computing 

^y^y) = 0W ^ ^ {u , y)n,\^, V = 1 , . . . , k , 



starting with q^^\u,y) = 7r„0 (i), • We then have, 



/(?/) = E^^'^H^,?/)- 



u 



The above recursion may be efficiently implemented using the matrix notation, and 
let f{y) = q^'^\y)'l where 1 is a column vector of ones of suitable dimension and 
q^^\y) is a column vector with elements q^^\u,y). The recursion is then expressed 
as: 



with TT = {ttu, u = 1, . . . , k} denoting the initial probability vector, (p^' = {(py^^, u = 
1, . . . ,k} denoting the conditional probability vector and 11 = {n^\u, u,v = 1, . . . , k} 
denoting the transition probability matrix. 

Finally, for an observed sample of n subjects, let y^ denote the observed response 
vector provided by subject i, the model likelihood may be formulated as L{y\6) = 
Hi f{yi)i where 6 is the vector of all model parameters arranged in a suitable way. 

2.2 Multivariate version 

In the multivariate case, we observe a vector of r response variables, denoted by 
Yf^ = . . . , yj^.*"*), for every subject i and occasion t, with i = l,...,n and 

t = 1, . . . , T. Each response variable has categories, j = 1, . . . , r, coded from to 
Ij — 1- Moreover, all responses provided by subject i are collected in the vector Yi. 
The assumption of local independence is usually formulated by also requiring that 
the elements of each vector Y^^ are conditional independent given . 
The model assumptions imply that 




(2) 



p{Y, = y\U, = u)= IIp{Y? = y'-'^lU^'^ = nW), 



(3) 



where y is made of the subvectors t/^*) = (?/| , . . . , y'^^) and 

p(yf = !/<''|y^"' = «")) = ^e■.M..■ 

3 

With 4>fy^^ = p{Yl^ = y|f/f) = n), J = l,...,r, t = 1,...,T, n = l,...,k. The 
manifest probabihty f{y) has the same expression as in (1), with p{Yi = y\Ui = u) 
computed as in (3), and it may be computed by exploiting the recursion rule, along 
similar line as in (2). The likelihood has the same expression as in the univariate 
categorical data. 

3 Constrained and extended versions of the basic 
model 

In the basic LM model outlined in the previous section, all the probabilities are 
completely unconstrained. There are two generalizations which may be of interest 
and commonly arise in applications. First, we may put restrictions on the parameter 
space, in order to give a more parsimonious and easily interpretable model. Secondly, 
we may have observed covariates together with the outcomes. Both generalizations 
may concern either the distribution of the response variables (i.e., the measurement 
model) or the distribution of the latent process (i.e., the latent model). For a more 
detailed description see Bartolucci et al. (2010). 

3.1 Constrained versions 

We discuss here only the constraints on the measurement model in order to parameter- 
ize the conditional response probabilities. In the univariate case a sensible constraint 
may be 

(t^tw = (t>y\u, t = l,...,T, u = l,...,k, y = 0,...,l-l. (4) 



This constraint corresponds to the hypothesis that the distribution of the response 
variables only depends on the corresponding latent variable and there is no depen- 
dence of this distribution on time. 

Other interesting constraints may be expressed by 

r?!*^ = Z^^(3, (5) 

where 77^*^ = &(0o||j, • • • , 4>fli\y)^ with g{-) being a suitable hnk function, Z^^ being a 
design matrix and (3 being a vector of parameters. 

In the case of binary response variables, we can parameterize the conditional 
probabilities through the logit link function r^^^*) = log(0^*^/0o*|j). With response 
variables having more than two categories, a natural choice is that of multinomial 
logit link function, so that ri^\y) = log(0^*^„/0o*i)? U = ■ ■ ■ J ~ ^- However when 
the response variables have an ordinal nature, global or continuation type logits are 
more suitable; see Bartolucci (2006). For instance, in the case of binary variables, by 
assuming that 

r]^^=C,-J'\ t = l,...,T, M = l,...,fc, 

we can formulate a LM version of the Rasch model (Rasch, 1961), which finds a 
natural application in psychological and educational assessment. In this case, the 
parameters (u are interpreted as ability levels. 

In the multivariate case, constraints (4) becomes 

^Jl\u = ^j,y\^^ 3 = ^,---,r, t = 1, . . . , T, m = 1, . . . , fc, y = 0,...Jj-l. (6) 

Moreover, we can use a link function of the type 



(7) 



in order to parameterize the conditional distribution of each response variable as in 
(5), with rifl = 9,i<l^\^,---,<§^.,\J. 

3.2 Extended versions based on the inclusion of individual 
covariates 

As discussed above, the covariates can be included both in the measurement model 
and in the latent model. In the former case, the conditional distribution of the 
response variables given the latent states may be parameterized by generalized logits. 
This parametrization recalls that used in (5), for the univariate case, and in (7), 
for the multivariate case, to formulate the constraints on the measurement models. 
Note that, using this formulation, the assumption of local independence is relaxed 
by allowing association between the response variables observed at the same occasion 
even conditional on the latent state. 

About the model interpretation, when the covariates are included in the mea- 
surement model, the latent process is seen as a way to account for the unobserved 
heterogeneity between subjects. The advantage with respect to a standard random 
effect or latent class model with covariates is that we admit that the effect of un- 
observable covariates could be non constant over time, but it could have its own 
dynamics. 

When the covariates influence initial and transition probabilities of the latent 
process, we suppose that the response variables measure and depend on the latent 
variable (e.g. the quality of life), which may evolves over time. In such a case, the 
main research interest is in modeling the effect of covariates on this latent variable 
distribution (Bartolucci et al., 2009). 

In this paper, we deal with a model very similar to that proposed by Bartolucci 
and Farcomeni (2009), that is a multivariate extension of the basic LM model in 



which the conditional distribution of the response variables depends on the individual 
covariates. This extension is illustrated in the following. 

Let xf\ denote the vector of individual covariates for subject i at occasion t. 
Following the formulation of Bartolucci and Farcomeni (2009), we parameterize the 
conditional distribution of the response variables given the latent process by a multi- 
variate marginal link function (Bartolucci et al., 2007a). In particular, let denote 
the column vector having elements p(l^f'* = ylu'f' = u, xf^) for all the possible con- 
figurations y of the responses. This probability vector is parameterized by marginal 
logits and marginal log-odds ratios which are collected in the vector rifl that may be 
simply expressed as 

r;12 = Clog[Mpg], (8) 

where C and M are appropriate matrices whose construction is described in Bar- 
tolucci et al. (2007a); see also Colombi and Forcina (2001). Logits and log-odds ratios 
may be of local, global, or continuation type; the choice is driven by the nature of 
the response variables, essentially ordinal or non-ordinal. 

To relate the above marginal effects to the covariates, we assume that 

= iu + X?(3, r,^^l, = 7, (9) 

where rji^^^^ is the subvector of rjfj^ containing the logits and r/j^l^^ is the subvector 
containing the log-odds ratios. Moreover, xf^ is a suitable design matrix defined 
on the basis of xf\ whereas /3 and 7 are vectors of parameters. Note that 
u = 1, . . . ,k, may be seen as support points, corresponding to each latent states, for 
individual random effects which are time-varying. 

As discussed above, the resulting model allows for unobserved heterogeneity be- 
yond individual covariates; moreover, the effect of the first is admitted to be time- 
varying. This extension is of interest when we want to investigate on the direct effect 



of the covariates on the response variables. 



4 Bayesian setting 

The basic LM model and its extended versions are considered here in the Bayesian 
setting; at this aim, we introduce the system of priors elicited for the dimension and 
the unknown model parameters. 

4.1 Basic latent Markov model 

In specifying the prior distributions for the initial and transition probabilities we 
follow the approach of Cappe et al. (2005) and Spezia (2010), who exploit a trans- 
formation (based on unnormalized probabilities) which facilitates the estimation. In 
particular, we let 7r„ = X^/ where \u, are assumed a-priori independent, with 

distribution Ga{5u, 1) for m = 1, . . . , fc; similarly, 7r„|„ = A™/ ^^uw where A™, are 
assumed a-priori independent, with distribution Ga{5uv, 1) for M,f = 1, . . . , k. The 
Xuv are not identified, but this transformation facilitates the MCMC moves, since 
it relaxes the constraints on the initial and transition probabilities (see also Cappe 
et al., 2003). Typically, the hyperparameters are chosen as 5fU = 1, u = l,...,k, 
and 6uv = k ■ I{u = v) + 0.6 ■ I{u v), u,v = 1, . . . , k, where I (A) is the indicator 
function. With the latter choice, as usual in LM models, the probability of persistence 
is greater than the probability of transition. This system of priors results equivalent 
to a system based on Dirichlet distributions. In the following, we denote by A and A 
the vector and the matrix with elements A„ and Xuv, respectively. 

We also consider the same reparametrization for the conditional response probabil- 
ities, through the vectors '!/'[,*■' with elements as (p^^^ = ipyu/ J2h'^hl- We assume 
an independent Gamma prior distribution for ip^*^ ~ Ga(S^*^, 1), choosing 5^^ = 1 for 
y = 0, . . . , / - 1, M = 1, . . . , /c, t = 1, . . . ,T. 



Finally, for the parameter k we define a discrete Uniform prior distribution be- 
tween 1 and /Cmax; where /Cmax is the maximum number of states we admit a priori. 
Usually /cmax is greater than the most complex model that could be visited by the 
algorithm; we choose /Cmax = 10. 

The above setting can be easily extended to the case of multivariate categorical 
data. 

4.2 Constrained and extended versions 

In the constrained versions expressed by (5) and (7), once defined a system of priors 
for the initial and transition probabilities and for the number of latent states k, as 
in Section 4.1, it only remains to choose a prior distribution for the generic vector of 
parameters /3. It is natural to assume f3 ~ A^(0, cr^I), where and I are respectively 
a vector of zeros and an identity matrix of suitable dimension. The choice of (t| 
depends on the constraints adopted and on the context of application. Typically, 
5<al< 10. 

About the model based on assumption (9), which allows for the inclusion of indi- 
vidual covariates, we assume that the vectors ^„ are a-priori independent with dis- 
tribution A^(0,(t|/). Similarly, we assume that /3 ~ iV(0,cr^/) and 7 ~ A^(0,cr^/). 
Also in this case we assume 5 < o"| = cr| = cr^ < 10. Again, concerning the initial 
and transition probabilities and the number k of latent states, are still valid the prior 
assumptions defined in Section 4.1. 

5 Reversible jump algorithm 

In this paper, we propose a framework for Bayesian inference on LM models, imple- 
menting a RJ algorithm which draws samples from the posterior distribution of the 
parameters and simultaneously from that of the number of latent states. The pro- 



posed framework has many points in common with those developed for HM models 
(Robert et al., 2000; Spezia, 2010) about the specification of the priors and/or the 
structure of the estimation algorithm. 

In particular, the proposed algorithm is based on two different types of move. The 
moves of the first type are aimed at updating the parameters of the current model 
given the number of states; those of the second type also allow us to update the 
number of states. In more detail, the algorithm performs the following steps: 

Step 1: Metropolis-Hastings (MH) move in order to draw, given the current k, the 
parameters from their posterior distributions. 

Step 2: split /combine move (each proposed with probability 0.5). The split proposal 
consists of choosing a state at random and splitting it into two new ones. The 
corresponding parameters are split using auxiliary variables. In the combine 
move a pair of states is picked at random and merged into a new one, so as to 
recover the values of the auxiliary variables of the split move. 

Step 3: birth/death move (each proposed with probability 0.5). The birth move is 
accomplished by generating a new state and drawing the new parameters from 
their respective priors. In the death move a state is selected at random and 
then deleted along with the corresponding parameters. 

This structure closely recalls the one of the RJ algorithm for mixture models 
proposed by Richardson and Green (1997), although the birth and death moves are 
not limited to the empty components (see Spezia, 2010; Cappe et al., 2005, Ch. 13). 
Moreover, in the first type of move, the simulation from the posterior density is accom- 
plished through the MH step instead of implementing a Gibbs sampler. Furthermore, 
our implementation does not simulate the latent process since it directly exploits the 
manifest (or marginal) distribution, which is computed by a suitable recursion (Baum 
et al., 1970). We decide to simulate from the posterior distribution of the parameters 



using random walk MH moves without completion because the resulting algorithm is 
easier to implement within the RJ framework. Moreover, as pointed out in Celeux 
et al. (2000), the Gibbs sampler is not always appropriate for sampling from a multi- 
modal distribution, because it is not always able to explore the posterior surface and 
to escape local mode (see also Jasra et al., 2005). Finally, instead of passing through 
each step deterministically, we choose to randomly select, at each iteration, among 
split /combine and birth/death step with probability equal to 0.5. The MH step is 
always performed. 

A well-known problem occurring in Bayesian mixture modeling, is the label switch- 
ing problem that can be seen as the non-identifiability of the component due to the 
invariance of the posterior distribution to the permutations in the parameters label- 
ing. Several solutions have been proposed in the literature; in particular, we cite: 
artificial identifiability constraints (Diebolt and Robert, 1994; Richardson and Green, 
1997) and the related random permutation sampling (Friihwirth-Schnatter, 2001), 
the relabeling algorithm (Celeux, 1998; Stephens, 2000) and the label invariant loss 
function methods (Celeux et al., 2000; Hurn et al., 2003). For a general review see 
Jasra et al. (2005) and Spezia (2009). Since this issue is of great complexity, we 
decide to use relabeling techniques retrospectively, by post-processing the RJ output 
as in Marin et al. (2005). In particular, the label switching is managed by sorting 
the MCMC sample of the draw obtained at the end of the iterations on the basis of 
the permutation of the states which minimizes the distance from the posterior mode. 
More details are given in Section 5.1. 

5.1 The algorithm 

We now describe in more detail the three steps of the RJ algorithm which allow 
for the estimates of the model parameters and the unknown number of states k. 
Concerning the constrained and the extended versions, we only illustrate these steps 



for the multivariate LM model with covariates formulated on the basis of assumption 
(9); the steps for LM versions based on different assumptions may be easily derived. 

Step 1- Metropolis Hastings move 

In the first step of the algorithm, with fixed k, the model parameters are drawn from 
their posterior distribution on the basis of separate multiplicative random proposals. 
About the unnormalized initial, transition and conditional probabilities, we consider 
a logarithmic transformation of the positive quantities A„ A„„, and ip^^, in order to 
mapped them onto the real line. The proposed moves are: 

1. log a; = log A„ + e„, with e„ ~ iV(0, tx) for m = 1, . . . , fc. 

2. logA;,, = log A„„ + e™, with e^,^ ~ A^(0, ta) for u,v = 1, . . . ,k. 

3. (a) For the basic LM model, 

- logV'W* = log^Aj*2 + eW, with eW ~iV(0,r^) for y = 0, 1, u = 
1, . . . , /c, 

t = l,...,T. 

(b) For the extended model with covariates, 

- ^hu = ihu + ehu, with eh,u ~ iV(0, r^) for /i = 1, . . . , n^, m = 1, . . . , fc. 

- (3* = A + ^i, with ei ~ A^(0, Tp) for i = 1, . . . , np. 

- 7* = 7j + ej, with ~ A^(0, r^), for j = 1, . . . , n^. 

Note that n^, np and are the dimension of the vectors (3 and 7, i.e. respectively, 
the number of marginal logits, the number of parameters, and the number of log-odds 
ratios. The acceptance probabilities of the proposed values, for both versions, include 
the Jacobian that arises because we work with a log-scale transformation. This is 
given by n«,t, A;„/A„,„ n« A;/A„ and ny,«,t V^^VV^S^ respectively. 



Step 2 - split/combine move 

Suppose that the current state of the chain is {k, 0k)', in this step we choose between 
spht and combine move with probabihty 0.5. Obviously, when /c = 1, we always 
propose a split move, while when k = /Cmax we propose a combine move. 

In the split move a state Uq is randomly selected and spht it into two new ones, 
Ul and U2. The corresponding parameters are split as follows: 

1. Split Au„ as 

Ki = KoP, A„2 = A„(3(l-p) with p ~ f/(0, 1). 

2. Split column uq of A as 

A„„i = Xuuo Pu, Ku2 = Am„o(1 - Pu) with pu ~ U{0, 1) for u ^ uq. 

3. Split row Mo of A as 

4. Split A„o 

A-iil Ul ^U{)U{)Puq^U\'; ^U\U2 A^q^q(1 Piiq)^U2'1 

All2 «1 ^UQUqPuq / ^U\1 ^U2U2 ^UQUoi^ Puq)/'^U21 

with ~ f/(0, 1) and 'dui,'^u2 ~ Ga{a^,b^). 

5. (a) For the basic LM model, split ip^l^^ as 

with t?!^*) ~ Ga{a^, b^) for ?/ = 0, — 1, t = 1, . . . , T. 
(b) For the extended model with covariates, perturbate $,huo as 

Chui = ^huo-^h, ihu2 = Chuo+^h, with Eh ~ N{0,t'^) for h = I, . . .,n^. 



It is worth noting that the densities of the proposals are identical, as p and 1 — p have 
the same distribution and likewise for d and \/d and e and —e respectively (here 
subscripts on these variables are omitted), so the symmetry constraints are satisfied. 

In the reverse combine move two distinct states, ui and U2, are picked at random 
and merged into a single state uq, so as to preserve reversibility: 

+ \uu2 for u ^ Uq. 
3. Xuov = (A„it, A„2^,)^/^ for V ^ uq. 

A \ — (\ \ U/2 I i' \ \ U/2 

5. (a) For the basic LM model ipfl^ = for y = 0, 1. 

(b) For the extended model with covariates ^huo = {^hui + ihu2)/'^ for h = 
l,...,n^. 

Note that the split / combine move does not influence the parameters /3 and 7 as they 
are not affected by the number of states. 

The split move is accepted with probability min{l. A} whereas the combine move 
is accepted with probability min{l, y4~^}. In the basic LM model, A can be computed 

as 
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(10) 



When we deal with the model assumption (9), the above formula becomes 



LiylOk+MOk+Mk + l) (^ + 1)! Pc{k + l)/[{k + l)k/2] 

Ps{k)/k 



J\ 

L{y\Ok)p{0k) Psik) pi^uM^u,) Uv^uo PiA') UkP{eh) ' 

(11) 

In both the equation, L{y\Ok) represents the likelihood computed via the forward 
algorithm, while p{6k) is the prior distribution of all model parameters. Moreover, 
Ps{k)/k and Pc{k+ l)/[{k + l)k/2\ are respectively the probabilities to split a specific 
component out of k available ones, and to combine one of [k + l)k/2 possible pairs 
of components. We also note that p{k + l)/p{k) cancels out and that the Uniform 
variables involved have densities equal to unity. The factorials and the coefficient 2 
arise from combinatorial reasoning related to label switching. | J| is the Jacobian of the 
transformation from 0^ to which is the product of five determinants | Ji| = A^q, 

\J2\ = Ylu^uo^uui)-, \J?,\ = '^^"^Ylvjtuo'^uovMv, \Ja\ = ^^liouoPuoi^ ~ Puo)Mui'^U2 ^ud 

Step 3 - birth/death move 

This step is performed with probability 0.5, along similar lines as split/combine move. 

The birth move is accomplished by generating a new state, denoted by mq, drawing 
the new parameters from their respective priors. The remaining parameters are simply 
copied to the proposed new state In the death move a state Uq is selected at 

random and then deleted along with the corresponding parameters. 

The acceptance probability of the birth move is min{l,y4}, where A may be ex- 
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pressed by the following formulas 
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that can be applied to the basic LM model and to the model with covariates, respec- 
tively. 

The death move is accepted with probability min{l,74~^}. Since the proposal 
densities are equal to the priors of the corresponding parameters, and because the 
components in Ok remain the same in 6k+i, many terms cancel out in the expression 
above. Note that \ J\ = 1. 

Post-processing method 

At the end of the iterations of the algorithm, we select the model with the highest 
posterior probability of the number of state k, i.e. the model that has been visited 
most often by the RJ algorithm, after discarding the burn-in period. After that we 
collect the MCMC sample of the draws obtained when the best model was visited. 
Then it is possible to compute the ergodic averages of those parameters, as f3 and 
7, that are not affected by the number of states. Concerning the remaining model 
parameter estimates, and in order to tackle the label switching problem, we need 
to apply the post-processing method, as in Marin et al. (2005). In particular, for 



a sample 6^^\ . . . , 6^^^ drawn from the posterior distribution of the parameters of 
a model with k latent states, the post-processing method is based on the following 
steps: 

1. compute the posterior mode 6 as: 

e = argmax^^i^ jv L{y\e^^)p{e^'^). 

2. Let hi^O^"^^) denote the vector O'^'^^ with elements permuted according to certain 
permutation of the latent states and let % denote the space of all possible 
permutations. 

3. For i = 1,...,N, substitute 0*-*^ with the corresponding permutation which 
minimizes the distance from 6, i.e., 

argmin,g^||M0«)-^||. 

6 Empirical illustrations 

To illustrate the Bayesian inference for the class of LM models proposed in this 
paper, we describe the analysis of two real datasets. The first one concerns the use 
of marijuana among young people and it is analyzed using the basic LM model and 
a model with constraints on the conditional response probabilities. The second one 
is a dataset extracted from the database derived from the Panel Study of Income 
Dynamics, and is about fertility and female participation to the labor market. In this 
case, to fit the data, we use the more complex LM model based on assumption (9), 
which allows for the presence of covariates. 



6.1 Marijuana consumption dataset 

The marijuana consumption dataset has been taken from five annual waves (1976- 
1980) of the National Youth Survey (Elliott et al., 1989) and is based on n = 237 
respondents who were aged 13 years in 1976. The use of marijuana is measured 
through T = 5 ordinal variables, one for each annual wave, with I = 3 levels coded as 
for never in the past year, 1 for no more than once a month in the past year and 
2 for more than once a month in the past year. We want to explore whether there is 
an increase of marijuana use with age. 

As illustrated by Vermunt and Hagenaars (2004), a variety of models may be used 
for the analysis of this dataset but a LM approach is desirable for its fiexibility and 
easy interpretation (see Bartolucci, 2006). 

In our implementation, we used the system of priors outlined in Section 4.1 with 
Su = 1, Suv = k-I{u = v) + 0.6- I{u 7^ v), and S^^ = 1, for the unnormalized initial and 
transition probabilities and for the conditional response probabilities, respectively; 
^max was set equal to 10. Moreover, in the MH step, the parameters were updated 
for fixed k through an increment random walk proposal on each logA^j, logA^^^, and 
log-ip^}^, with Tx = 0.5, ta = 0.1 and = 0.2. The sampler parameters were tuned so 
as to achieve acceptance rates in the range 0.1 — 0.25 for all values k < 10. In the split 
move we used ax = = = 1 and bx = = = 1 as parameters of the Gamma 
distributions. We also decided to simplify the model using constraint (4), in which 
the conditional probabilities were assumed to be time homogeneous, i.e. (f)y^^ = (j)y\u 
and analogously, ^p^^ = ifjyu- Finally, the algorithm ran for 1, 000, 000 iterations with 
a burn-in of 200, 000 iterations. The starting values were randomly chosen. The 
acceptance rates are illustrated in Table 1. Concerning the transdimensional moves, 
we note that the acceptance rates are a bit lower than desired, but if we consider that 
split /combine or birth/death moves only involve a change of model dimension, and 



all the other parameters are updated in each sweep, they are not too low (see Robert 
et al., 2000, for comparable results). The estimated posterior probabilities are 0.689, 





Performed 


Accepted 


% Accepted 


MH with fixed k 


1,000,000 






Initial probabilities 




209,312 


20.93 


Transition probabilities 




127,815 


12.78 


Conditional probabilities 




133,458 


13.35 


Birth 


250,268 


2,129 


0.85 


Death 


250,229 


2,117 


0.85 


Split 


250,107 


770 


0.31 


Combine 


249,396 


788 


0.32 



Table 1: Acceptance rates for MH move, split/combine and birth/death move under 
the basic LM model 

0.277, 0.031, 0.002 for = 3, 4, 5, 6 and below 0.001 for smaller and larger values of k. 
Hence, the most probable model is that with three latent state, the same as selected 
by Bartolucci (2006) using BIG. 

In order to face the label switching problem, at the end of all iterations, we 
post-processed the output as illustrated in Section 5.1. Moreover, to have a clearer 
interpretation of the results, the MCMC draws were sorted on the basis of the condi- 
tional probabilities of the last category of the response variables. Doing this, the last 
class of the LM model may be interpreted as that of subjects with high tendency to 
use marijuana. Once the output was post-processed, we estimated the parameters of 
the model by the ergodic averages, taken over the final 800,000 iterations; the result- 
ing parameter estimates are reported in Table 2. We can see that these results are 
very similar to that obtained by Bartolucci (2006) with the EM algorithm. 

We also tried to put constraints on the distribution of the response variables, so 
as to give a proper interpretation of the model. We assumed a parametrization for 
the conditional local logits of any response variable given the latent state: 

Vy\u = '^og-p^ = Cu + uJy, u = l,...,k, y = l,...,l-l, (14) 

(Py-l\u 
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u = 1 


u = 2 


u = 3 
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0.868 


1 


0.847 


0.128 


0.025 
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0.080 


2 


0.073 


0.693 


0.233 
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0.052 
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0.016 


0.065 


0.919 



Table 2: Estimated initial probabilities and transition probabilities under the basic LM 
model 

where (u may be interpreted as the tendency to use marijuana for a subject in the 

state u and Uy, for y = 1, ...,/ — 1, are the cutpoints common to all the response 

variables. Note that we assumed again the constraint (4), i.e. the distribution of the 

response variables does not depend on time. The parametrization used here requires 

the choice of the prior distributions on and Uy, that we assumed to be A^(0, a^) 

and N{0,a^), respectively. We considered two choices of the prior parameters, i.e. 

0"^ = 0"^ = 5, 10, for all M = 1, . . . , and y = 1, ...,/ — 1, in order to see how 

the posterior distribution of the number of states changes with different values of the 

hyperparameters. The sensitivity to prior specification and therefore the choice of the 

hyperparameters is in fact one of the difficulties in Bayesian modeling, especially when 

there is little information to be used. We computed the posterior distribution of k also 

considering two different values of the parameters 6uv for the transition probabilities, 

i.e. 6uv = k ■ I{u = v) + 0.6 ■ I{u ^ v)* and 6uv = 1- The prior parameters for the 

initial probabilities were left unchanged, i.e. (5„ = 1 for m = 1, . . . , fc. 

In the MH step, the elements of both the parameters (u and Uy, were updated 

through a normal random walk proposal, A^(0,0.5); moreover in the split move the 

parameter Qo was split into = C«o ^ and C«i = Cno + <^u, where ipu ~ N{0, 0.2), 

with u = 1, . . . ,k, whereas in the reverse combine move the selected two parameters 

were combined into (uq = (C^i +Cu2)/2- The parameters setting for the unnormalized 

initial and transition probabilities was the same as in the basic LM model. The results 
*indicated in Table 3 as (fc — 0.6) 



were based on 1,000,000 iterations of the algorithm after a burn-in of 200,000 sweeps. 

Table 3 shows the results of the sensitivity analysis to prior specification. We can 
see that almost all the values of the hyperparameters lead to choose again a model 
with three latent states, even if the posterior probabilities of k are quite different. 
The first column of the table shows our choice in this application, that it seems to be 
the more adequate given the prior information we have. 
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< 2 


0.000 


0.000 


0.000 


0.000 
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0.474 


0.341 


0.932 


0.915 
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0.365 


0.361 


0.067 


0.082 
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0.122 


0.189 


0.001 


0.003 
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0.031 


0.075 


0.000 


0.000 
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0.007 


0.025 


0.000 


0.000 


> 8 


0.001 


0.010 


0.000 


0.000 



Table 3: Posterior distribution of the number of states k for different choices of the 
hyperparameters under the constrained LM model 



The acceptance rates for the random walk MH move and for the dimension chang- 
ing moves are illustrated in Table 4. We can see that these rates, for split /combine 
and birth/death moves, are higher than those achieved in fitting the basic LM model. 
Figure 1 shows the mixing and the stationarity of the algorithm with the plot of the 
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% Accepted 


MH with fixed k 


1,000,000 






Initial probabilities 




195,551 


19.56 


Transition probabilities 




129,441 


12.94 


Cu 




176,501 


17.65 


UJy 




185,274 


18.53 


Birth 


250,348 


5,543 


2.21 


Death 


249,399 


5,481 


2.20 


Split 


250,199 


1,611 


0.64 


Combine 


250,054 


1,677 


0.67 



Table 4: Acceptance rates for the MH move, the split/combine and the birth/death 
move under the constrained LM model with c"^ = cr^ = 5 and 6uv = k ■ I{u = 
v) + 0.6 ■ I{u ^ v) 



first 50,000 values of k after the burn-in, and the plot of the cumulative occupancy 
fractions for different values of k against the number of sweeps. From Figure 1(b) we 
can see that the burn-in is adequate to achieve stability in the occupancy fractions. 
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Figure 1: (a) Number of latent states in the first 50,000 iterations after the hum-in 
(h) Cumulative occupancy fractions for k = 3,4,5 under the constrained LM model 
with cr^ = cr^ = 5 and S^v = k ■ I{u = v) + 0.6 ■ I{u ^ v) 



Tables 5 and 6 show the parameter estimates, computed after post-processing the 
MCMC output. Even in this case, the latent states may be ordered, representing 
subjects with "no tendency to use marijuana", "incidental users of marijuana" and 
"high tendency to use marijuana". 



u 


L 


y 




1 


-5.321 
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0.775 
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-0.176 
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-1.977 


3 


4.173 







Table 5: Estimates of the parameters Qu and Uy under the constrained LM model with 
cr^ = (T^ = 5 and 6uv = k ■ I{u = v) + 0.6 ■ I{u ^ v) 



From the results, we can see that most subjects starts with a low tendency to 
drug consumption but from the estimated marginal probabilities of the latent classes 
emerge that the tendency to use marijuana increases with age, since the probability 
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u = 3 


1 


0.897 
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0.838 


0.148 


0.015 
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0.077 
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0.056 


0.717 


0.227 
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0.026 
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0.027 


0.058 


0.915 



Table 6: Estimated initial probabilities and transition probabilities under the con- 
strained LM model with cr"^ = cr"^ = 5 and Suv = k ■ I{u = v) + 0.6 ■ I{u ^ v) 

of the third class increases across time. From the estimated transition matrix we can 
see that a large percentage of subjects remains in the same latent class, but around 
23% of incidental users switches to the class of high frequency users. 

6.2 Analysis of the Panel Study of Income Dynamics dataset 

The second dataset analyzed in this paper is very similar to that used by Hyslop 
(1999) and by Bartolucci and Farcomeni (2009). The dataset was extracted from 
the database derived from the Panel Study of Income Dynamics, which is primarily 
sponsored by the National Science Foundation, the National Institute of Aging, and 
the National Institute of Child Health and Human Development and is conducted 
by the University of Michigan. The database is freely accessible from the website 
http://psidonline.isr.umich.edu, to which we refer for details. 

Our dataset concerns n = 482 women who were followed from 1987 to 1993. 
There are two binary response variables: fertility (indicating whether a woman had 
given birth to a child in a certain year) and employment (indicating whether she was 
employed). The covariates are race (dummy variable equal to 1 for a black woman), 
age (in 1986), education (year of schooling), child 1-2 (number of children in the family 
aged between 1 and 2 years, referred to the previous year), child 3-5, child 6-13, child 
14- and income of the husband (in dollars, referred to the previous year). 

In analyzing the dataset, the most interesting question concerns the direct effect of 
the covariates on the response variables. The approach considered here, allows us to 



separate these effects from the effect of the unobserved heterogeneity by modehng the 
latter by a latent process. In this way, we admit that the unobserved heterogeneity 
effect on the response variable is time-varying. 

On these data, we fitted a model formulated on the basis of assumption (9) with 
xf^ = I2 ® [xf^]' , where Id denoting an identity matrix of dimension d and the 
vector xf* includes the covariates indicated previously further to a dummy variable 
for each year. In particular, the logits may be parameterized as follows: 

,{t) 

log 3^ = a. + [-fY/3i, 

At) 

^i2,0\u 

whereas, for the log-odds ratio we have 

p(Y^ = l,Yi^ = l\ul^=u,.?) p(y!{^ = 0, Yi^ = 0|f/i*) = 

,__Cf^ .__c^^ "T ^"-"e) ■ 



■p(Y^ = l,YS^=0\U^^=u,.?) %(Y^I^=0,Y!i^ = l\ul^=u,x?) 
We implemented the proposed RJ algorithm with the following parameters setting 
and initialization. We used the prior distribution defined in Section 4.1 and 4.2 with 
Su = I, 6uv = k ■ I{u = v) + 0.6 ■ I{u 7^ v) and o"! = cr^ = 5, 10, with = (/3 U 7). 
The parameters used for the proposal distributions in the MH move were tx = 0.1, 
ta = 0.05, = 0.5, and ri, = 0.1. These values allowed us to obtain acceptance rates 
in the range 0.15-0.30. In the split /combine move we used = 2 for the Normal 
proposal and ax = = bx = h.§ = 1 for the Gamma distributions. The Markov chain 
was initialized from the maximum likelihood estimation obtained through the EM 
algorithm, with k = 1. Moreover, we ran the RJ algorithm for 1,000,000 iterations 
discarding the first 200,000 as burn-in. 

After the burn-in, the algorithm visited five states with posterior probabilities 
illustrated in Table 7. Table 8 also shows the acceptance rates for the different moves, 
whereas in Figure 2 and Figure 3 are illustrated the trace of k in the first 200,000 



iterations, after the burn-in, and the ergodic averages of the model probabihties, for 
both the choices of the prior parameters, o"! = cr^ = 5, 10. We can see that the 
algorithm leads to choose a model with a number of state between 4 and 5. The 
acceptance rates, especially for the split / combine move, are again a bit low, but this 
can be due to the complexity of the model. 
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0.081 


0.006 
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0.554 


0.298 
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0.320 


0.580 


6 


0.044 


0.107 


> 7 


0.003 


0.009 



Table 7: Posterior probabilities of the number of latent states under the LM model 
with individual covariates for a? = = 5, 10. 
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19.21 


18.64 


Transition probabilities 


18.18 


16.17 


i/ = (/3U7) 


16.13 


17.71 


iu 


25.34 


29.37 


Birth 


0.32 


0.35 


Death 


0.33 


0.36 


Split 


0.12 


0.11 


Combine 


0.12 


0.10 



Table 8: Acceptance rates for MH move, split/combine and birth/death move under 
the LM model with individual covariates for o"? = cr^ = 5, 10 



In Table 9 we show the estimates of the parameters, collected in vectors f3 and 
7, affecting the marginal logits of fertility and employment and the log-odds ratio 
between these variables, for a| = cr^ = 5 and k = 4 and for cr| = = 10 and k = 5. 
These estimates are straightforward to compute, through the ergodic averages of the 
draws obtained when the best model was visited. In Table 10 are also illustrated the 
same estimates computed by the ergodic means over all the draws (after discarding 
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Figure 2: (a) Number of latent states in the first 200,000 iterations after the hum-in 
(h) Cumulative occupancy fractions for /c = 3, 4, 5 under the LM model with individual 
covariates for a"^ = = 5 
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Figure 3: (a) Number of latent states in the first 200,000 iterations after the burn-in 
(b) Cumulative occupancy fractions for /c = 4, 5, 6 under the LM model with individual 
covariates for cr| = = 10 



the burn-in) without hmiting these averages to the draws obtained when the selected 
model was visited. This can be done since the parameters (3 and 7 do no depend 
by the number of states. In both the tables, we also show the 90%, 95% and 99% 
posterior credible intervals not containing zero. Note that some covariates have been 



standardized, before starting the estimation algorithm, for computing purposes. 
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intercept 
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t In standardized form 

* posterior 90%HPD not containing zero 
** posterior 95%HPD not containing zero 

* * * posterior 99%HPD not containing zero 



Table 9: Posterior estimates of the model parameters affecting the marginal logits for 
fertility and employment and the log- odds ratio, under the LM model with individual 
covariates 

On the basis of the estimates of the parameters for the covariates, we can see 
that the resuhs are very similar both if we take the means of the draws limited to 
the model of interest or if we take the overall means. Moreover these estimates are 
not influenced by the prior specification we used. In particular, age seems to have 
an effect on fertility but not on employment. At this regards we can consider that 
the women in the sample were aged between 18 and 47, which is a limited range of 
years if we want to effectively study the effect of aging on the probability of having 
a job. We also note that the education has a significant effect on both fertility and 
employment, whereas income of the husband affects only the logit of employment. 
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t In standardized form 

* posterior 90%HPD not containing zero 
** posterior 95%HPD not containing zero 

* * * posterior 99%HPD not containing zero 



Table 10: Posterior estimates of the model parameters, under the LM model with 
individual covariates, computed by the ergodic means over all the iterations 



Moreover, the number of children aged between 1 and 5 years has an effect on the 
employment while the number of children aged between 3 and 13 years affects the 
fertility. The log-odds ratio between the two response variables is negative and is 
significant based on the 90% posterior interval. This result can be interpreted as a 
negative association between the two response variables, referred to the same year. 

In order to estimate the value of the remaining parameters, and in order to face 
the label switching problem we applied the post-processing algorithm of Marin et al. 
(2005); moreover, we sorted the output of the algorithm on the basis of the drawn 
support points Ci, . . . , Once the post-processing algorithm has been performed we 
could compute the estimates of those parameters that are affected by the number of 



state on the basis of the ergodic averages, after discarding the draws obtained during 
the burn-in period. 

In Table 11 and 12 we show the results of this estimation procedure, through the 
estimates of the support points (one for the marginal logit of fertility and the other 
for that of employment) corresponding to each latent state, and the estimated initial 
probabilities and transition probability matrix, for both the hyperparameters chosen 
in the prior specification. Though the number of states selected is different, both 
the specifications lead to the same conclusions. In particular, the latent process can 
be interpreted as an error component which follows a process that may be seen as a 
discrete version of an AR{1). The support points are in increasing order on the basis 
of the marginal logit of employment; the latent states may therefore be interpreted 
as different levels to give birth to a child or to get a job position. For example, 
the first latent state corresponds to subjects with the highest propensity to fertility 
and the lowest propensity to employment. Moreover, it is interesting to observe that 
the transition matrix has an almost symmetric structure, with a large percentage of 
subjects that remains in the same latent state. 



Latent state 


Support points 
Fertility Empl. 


Initial prob. 


Transition probabilities 


1 


-1.796 -4.937 


0.092 


0.734 0.065 0.055 0.146 


2 


-1.936 -3.718 


0.102 


0.072 0.643 0.067 0.219 


3 


-2.648 -0.002 


0.228 


0.071 0.072 0.754 0.103 


4 


-2.609 5.980 


0.578 


0.021 0.027 0.009 0.944 



Table 11: Estimated support point for each latent state, estimated initial probabilities 
and estimated transition probability matrix for the LM model with individual covari- 
ates with a'^ = = cr'^ = 5 and k = 4 



Latent 


Support points 


Initial 












state 


Fertility 


Empl. 


prob. 




Transition probabilities 




1 


-1.902 


-5.461 


0.084 


0.553 


0.063 


0.090 


0.060 


0.234 


2 


-1.965 


-5.117 


0.103 


0.043 


0.754 


0.050 


0.052 


0.101 


3 


-3.180 


0.174 


0.194 


0.048 


0.052 


0.684 


0.163 


0.053 


4 


-2.117 


3.542 


0.090 


0.187 


0.081 


0.059 


0.555 


0.117 


5 


-2.634 


7.786 


0.530 


0.022 


0.014 


0.005 


0.021 


0.939 



Table 12: Estimated support point for each latent state, estimated initial probabilities 
and estimated transition probability matrix for the LM model with individual covari- 
ates with (j| = = cr^ = 10 and k = 5 

7 Conclusion 

In this paper, we proposed a framework for Bayesian inference on a class of LM 
models for categorical longitudinal data. We considered in particular the basic LM 
version, in which the latent Markov chain is of first-order and time homogeneous, 
and some extended versions which include constraints and individual covariates in 
the measurement model, which corresponds to the conditional distribution of the 
response variables given the latent states. 

The proposed inferential approach is based on a system of priors whose specifica- 
tion follows that adopted by Cappe et al. (2005) and Spezia (2010) for HM models. 
In particular, this system of priors is formulated on a transformation of the initial 
and transition probabilities which is equivalent to a system based on Dirichlet distri- 
butions. 

With the aim of estimating the model parameters and the number of latent states, 
we implemented an RJ algorithm that allows us to simultaneously draw samples from 
the posterior distribution of the parameters and the number of states. The choice 
of the system of priors leads to an algorithm easier to implement; in particular, the 
computation of the Jacobian of the transformation from the current value of the 
parameters to the new value is easier with respect to a system of priors based on 
Dirchlet distributions. The structure of the proposed RJ algorithm has many points 



in common with the RJ algorithms for mixture models of Richardson and Green 
(1997) and for HM models of Robert et al. (2000). In particular, our algorithm is 
based on moves of MH type, which update the parameters of the current model given 
the number of states, and moves of split /combine and birth/death type, aimed at 
also updating the number of states. 

The proposed approach can be extended in several ways and even applied to 
different LM formulations. We are referring, in particular, to the development of 
a similar Bayesian framework for the extended versions of the LM model in which 
individual covariates are included in the latent model. These covariates are then 
assumed to affect the initial and transition probabilities of the latent Markov chain. 
This extension would require a different formulation of the priors, based, for instance, 
on Normal distributions assumed on suitable transformations of these initial and 
transition probabilities. Natural transformations are based on multinomial logits. 

Moreover, it is possible to extend the proposed framework in order to deal with 
missing responses, that we can assume to be missing at random in the sense of Rubin 
(1976). Thus, the missing data mechanism is ignorable for posterior inference. It 
is possible to make the proposed RJ algorithm able to handle data with missing 
responses of this type. The missing data can be estimated along with the parameters 
of the LM model, through the steps of the algorithm. 

Other interesting extensions concern the implementation of an algorithm for path 
prediction, i.e. to predict the sequence of latent states of a subject on the basis of the 
observed data, and the Bayesian model averaging, in order to estimate parameters 
with invariable dimension with respect to the number of states (e.g., parameters for 
the covariates) and for prediction of the responses. 

Finally, an aspect that has to be remarked and that requires additional future 
work, concerns the sensibility of the inferential results on the prior specification. A 
first analysis has already been done in the two illustrative examples, showing that 



differences in the prior specification may lead to differences in the estimation of the 
number of latent states. However, further research is necessary in order to have a 
more conclusive answer about this issue. 
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